Merge#

Merge two or more tabular datapackages. If the paths within the datapackage match append the CSV files together, including fields from the matching CSV files.

Usage Example#

This is a simple example of merging two datapackages where the second file adds a field and a new file.

from datapackage_convert import merge_datapackage

merge_datapackage(f'/tmp/output', ['base_datapackage', 'add_field_and_file'])

base_datapackage#

The base_datapackage directory in the above example looks like:

├── base_datapackage
│   ├── csv
│   │   └── games.csv
│   └── datapackage.json

And games.csv looks like:

id

title

1

game1_base

2

game2_base

add_field_and_file datapackage#

We are merging to following datapackage

├── add_field
│   ├── csv
│   │   ├── games.csv
│   │   └── apps.csv
│   └── datapackage.json

And games.csv looks like:

id

title2

3

game1_add_field

result datapackage#

After merging the output folder looks like:

├── add_field
│   ├── csv
│   │   ├── games.csv
│   │   └── apps.csv
│   └── datapackage.json

The app.csv is the same as in the add_field_and_file as it only appears in that datapackage. The games.csv looks like:

id

title

title2

1

game1_base

2

game2_base

3

game1_add_field

As you can see the games.csv files have been merged and the fields from both exist. title only exists in the first datapackage and title2 only exists in the second. Once merged both columns exist and the data is left blank when they do not exist.

Options#

delete_input_csv#

This will delete the input csvs from the original datapackages once merged. This is useful if the files are large and you just want to keep the merged datapackages.

Example

merge_datapackage(f'/tmp/output', ['base_datapackage', 'add_field_and_file'], delete_input_csv=True)

Notes and Caveats#

A new datapackage.json is created after merging.

If the two or more files are merged and their fields match but the fields are of different types then the new datapackage.json will just fall back to saying the field is a string.